智能论文笔记

Deep dynamic modeling with just two time points: Can we still allow for individual trajectories?

Maren Hackenberg , Philipp Harms , Michelle Pfaffenlehner , Astrid Pechmann , Janbernd Kirschner , Thorsten Schmidt , Harald Binder

分类： (统计)机器学习 | 机器学习

2020-12-01

纵向生物医学数据通常是稀疏时间网格和个体特定发展模式的特征。具体而言，在流行病学队列研究和临床登记处，我们面临的问题是在研究早期阶段中可以从数据中学到的问题，只有基线表征和一个后续测量。灵感来自最近的进步，允许将深度学习与动态建模相结合，我们调查这些方法是否可用于揭示复杂结构，特别是对于每个单独的两个观察时间点的极端小数据设置。然后，通过利用个体的相似性，可以使用不规则间距来获得有关个体动态的更多信息。我们简要概述了变形的自动化器（VAES）如何作为深度学习方法，可以与普通微分方程（ODES）相关联用于动态建模，然后具体研究这种方法的可行性，即提供个人特定的潜在轨迹的方法通过包括规律性假设和个人的相似性。我们还提供了对这种深度学习方法的描述作为过滤任务，以提供统计的视角。使用模拟数据，我们展示了方法可以在多大程度上从多大程度上恢复具有两个和四个未知参数的颂歌系统的单个轨迹，以及使用具有类似轨迹的个体群体，以及其崩溃的地方。结果表明，即使在极端的小数据设置中，这种动态深度学习方法也可能是有用的，但需要仔细调整。

translated by 谷歌翻译

Many real-world reinforcement learning tasks require control of complex dynamical systems that involve both costly data acquisition processes and large state spaces. In cases where the transition dynamics can be readily evaluated at specified states (e.g., via a simulator), agents can operate in what is often referred to as planning with a \emph{generative model}. We propose the AE-LSVI algorithm for best-policy identification, a novel variant of the kernelized least-squares value iteration (LSVI) algorithm that combines optimism with pessimism for active exploration (AE). AE-LSVI provably identifies a near-optimal policy \emph{uniformly} over an entire state space and achieves polynomial sample complexity guarantees that are independent of the number of states. When specialized to the recently introduced offline contextual Bayesian optimization setting, our algorithm achieves improved sample complexity bounds. Experimentally, we demonstrate that AE-LSVI outperforms other RL algorithms in a variety of environments when robustness to the initial state is required.

translated by 谷歌翻译

A default assumption in reinforcement learning and optimal control is that experience arrives at discrete time points on a fixed clock cycle. Many applications, however, involve continuous systems where the time discretization is not fixed but instead can be managed by a learning algorithm. By analyzing Monte-Carlo value estimation for LQR systems in both finite-horizon and infinite-horizon settings, we uncover a fundamental trade-off between approximation and statistical error in value estimation. Importantly, these two errors behave differently with respect to time discretization, which implies that there is an optimal choice for the temporal resolution that depends on the data budget. These findings show how adapting the temporal resolution can provably improve value estimation quality in LQR systems from finite data. Empirically, we demonstrate the trade-off in numerical simulations of LQR instances and several non-linear environments.

translated by 谷歌翻译

粒子加速器的调谐计算机参数是一项重复且耗时的任务，可自动化。尽管可以使用许多现成的优化算法，但实际上它们的使用量有限，因为大多数方法都不考虑每种迭代中的安全至关重要的约束，例如损失信号或步骤尺寸的限制。一个值得注意的例外是安全的贝叶斯优化，这是一种以嘈杂的反馈进行数据驱动的调谐方法。我们建议并评估Paul Scherrer Institut（PSI）的两个研究设施的安全贝叶斯优化的阶梯尺寸有限变体：a）瑞士游离电子激光器（瑞士法）和b）高强度质子加速器（HIPA）。我们报告了两台机器上有希望的实验结果，最多调整了16个受约束约束的参数。

translated by 谷歌翻译